Semi-supervised recursively partitioned mixture models for identifying cancer subtypes

نویسندگان

  • Devin C. Koestler
  • Carmen J. Marsit
  • Brock C. Christensen
  • Margaret R. Karagas
  • Raphael Bueno
  • David J. Sugarbaker
  • Karl T. Kelsey
  • E. Andres Houseman
چکیده

MOTIVATION Patients with identical cancer diagnoses often progress differently. The disparity we see in disease progression and treatment response can be attributed to the idea that two histologically similar cancers may be completely different diseases on the molecular level. Methods for identifying cancer subtypes associated with patient survival have the capacity to be powerful instruments for understanding the biochemical processes that underlie disease progression as well as providing an initial step toward more personalized therapy for cancer patients. We propose a method called semi-supervised recursively partitioned mixture models (SS-RPMM) that utilizes array-based genetic and patient-level clinical data for finding cancer subtypes that are associated with patient survival. RESULTS In the proposed SS-RPMM, cancer subtypes are identified using a selected subset of genes that are associated with survival time. Since survival information is used in the gene selection step, this method is semi-supervised. Unlike other semi-supervised clustering classification methods, SS-RPMM does not require specification of the number of cancer subtypes, which is often unknown. In a simulation study, our proposed method compared favorably with other competing semi-supervised methods, including: semi-supervised clustering and supervised principal components analysis. Furthermore, an analysis of mesothelioma cancer data using SS-RPMM, revealed at least two distinct methylation profiles that are informative for survival. AVAILABILITY The analyses implemented in this article were carried out using R (http://www.r.project.org/). CONTACT [email protected]; [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized mixture models, semi-supervised learning, and unknown class inference

In this paper, we discuss generalized mixture models and related semi-supervised learning methods, and show how they can be used to provide explicit methods for unknown class inference. After a brief description of standard mixture modeling and current model-based semi-supervised learning methods, we provide the generalization and discuss its computational implementation using three-stage expec...

متن کامل

Semi-supervised internet network traffic classification using a Gaussian mixturemodel

With a dramatic increase in the number and variety of applications running over the internet, it is very important to be capable of dynamically identifying and classifying flows/traffic according to their network applications. Meanwhile, internet application classification is fundamental to numerous network activities. In this paper, we present a novel methodology for identifying different inte...

متن کامل

A Note on the Descriptional Complexity of Semi-Conditional Grammars

The descriptional complexity of semi-conditional grammars is studied. A proof that every recursively enumerable language is generlji";'I":ff f iru*::?H:"ffi Xl,?li".ffi f iJ,,-,T?,ll,J"'"'n'"

متن کامل

Multi-Instance Mixture Models and Semi-Supervised Learning

Multi-instance (MI) learning is a variant of supervised learning where labeled examples consist of bags (i.e. multi-sets) of feature vectors instead of just a single feature vector. Under standard assumptions, MI learning can be understood as a type of semisupervised learning (SSL). The difference between MI learning and SSL is that positive bag labels provide weak label information for the ins...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 26 20  شماره 

صفحات  -

تاریخ انتشار 2010